## IN THE CLAIMS

- 1. (Currently Amended) A <u>computer-implemented</u> method, comprising:
  - identifying a candidate representing a plurality of instructions of a plurality of threads
    that perform one or more external memory accesses, the <u>one or more</u> external
    memory accesses having a <u>substantiallyan</u> identical base address, <u>including</u>
    partitioning the plurality of instructions of the external memory accesses into
    one or more sets of potential candidates based on dependency
    relationships of the instructions,
    - converting addresses of each external memory accesses into a form having a base address and an offset,
    - address into a single candidate, wherein a group having most of the

      potential candidates is selected as a final candidate for caching, and
      selecting one of the potential candidate sets as the candidate, instructions of the
      candidate satisfying a predetermined dependency relationship; and

inserting at least one of directives and instructions into an instruction stream corresponding to the identified candidate to maintain contents of at least one of a content addressable memory (CAM) and local memory (LM) of a processor and to modify at least one of the external memory access—accesses to access at least one of the CAM and LM of the processor without having to perform the respective external memory access.

- 2. (Cancelled)
- 3. (Cancelled)

- 4. (Currently Amended) The method of <u>claim 3 claim 1</u>, wherein the base address is a non-constant part and the offset is a constant part of the converted address.
- 5. (Currently Amended) The method of claim 3, further comprising A computerimplemented method, comprising:
  - identifying a candidate representing a plurality of instructions of a plurality of threads
    that perform one or more external memory accesses, the one or more external
    memory accesses having an identical base address, including
    partitioning the plurality of instructions of the external memory accesses into
    one or more sets of potential candidates based on dependency
    relationships of the instructions,
    - converting addresses of each external memory accesses into a form having a base address and an offset,
    - screening out one or more ineligible candidates from the potential candidates,
      wherein the ineligible candidates include a base address that is different
      from a remainder of the potential candidates, and
    - selecting one of the potential candidate sets as the candidate, instructions of the candidate satisfying a predetermined dependency relationship,
  - inserting at least one of directives and instructions into an instruction stream

    corresponding to the identified candidate to maintain contents of at least one of
    a content addressable memory (CAM) and local memory (LM) of a processor
    and to modify at least one of the external memory accesses to access at least
    one of the CAM and LM of the processor without having to perform the
    respective external memory access.
- 6. (Cancelled)

- 7. (Currently Amended) The method of claim 1, wherein the identifying the candidate further comprises:
  - performing a copy-forward transformation on addresses of each of the external memory accesses; and
  - performing at least one of a global value numbering operation and a constant folding operation for each thread.
- 8. (Currently Amended) The method of claim 3, further comprising: A computer-implemented method, comprising:
  - identifying a candidate representing a plurality of instructions of a plurality of threads

    that perform one or more external memory accesses, the one or more external

    memory accesses having an identical base address, including

    partitioning the plurality of instructions of the external memory accesses into
    - one or more sets of potential candidates based on dependency

      relationships of the instructions,
    - converting addresses of each external memory accesses into a form having a

      base address and an offset, and
    - selecting one of the potential candidate sets as the candidate, instructions of the candidate satisfying a predetermined dependency relationship; and

a content addressable memory (CAM) and local memory (LM) of a processor

inserting at least one of directives and instructions into an instruction stream

and to modify at least one of the external memory accesses to access at least

one of the CAM and LM of the processor without having to perform the

respective external memory access, including

for each thread, reserving a sufficient space in the local memory to store data portions of cache lines; and, and

inserting a caching instruction prior to each of the external memory accesses.

- 9. (Original) The method of claim 8, further comprising seeking the base address of each external memory access in the CAM to determine whether the CAM includes an entry that contains the base address being sought.
- 10. (Original) The method of claim 9, wherein if the CAM includes an entry containing the base address being sought, the method further comprises:
  - determining an offset of the local memory based on the entry of the CAM containing the base address being sought; and

accessing data from an entry of the local memory referenced by the determined offset.

- 11. (Original) The method of claim 9, wherein if the CAM does not includes an entry containing the base address being sought, the method further comprises allocating a least recently used (LRU) entry of the CAM having a base address of a previous external memory access.
- 12. (Original) The method of claim 11, further comprising:
  - loading data of a current external memory access from the external memory into an entry of the local memory referenced by the allocated LRU entry; and storing the base address of the current external memory access in the LRU entry of the CAM replacing the base address of the previous external memory access.
- 13. (Original) The method of claim 11, further comprising:
  examining the base address of the previous external memory access in the allocated
  LRU entry to determine whether the base address is valid; and

- replicating data of an entry in the local memory corresponding to the allocated LRU entry to a location of the external memory based address of the previous external memory access.
- 14. (Currently Amended) A machine-readable <u>storage</u> medium having executable code to cause a machine to perform a method, the method comprising:
  - identifying a candidate representing a plurality of instructions of a plurality of threads

    that perform one or more external memory accesses, the one or more external

    memory accesses having an identical base address, including

    partitioning the plurality of instructions of the external memory accesses into

    one or more sets of potential candidates based on dependency

    relationships of the instructions,
    - converting addresses of each external memory accesses into a form having a base address and an offset,
    - grouping multiple potential candidates having substantially an identical base

      address into a single candidate, wherein a group having most of the

      potential candidates is selected as a final candidate for caching, and

      selecting one of the potential candidate sets as the candidate, instructions of the

      candidate satisfying a predetermined dependency relationship; and

inserting at least one of directives and instructions into an instruction stream

corresponding to the identified candidate to maintain contents of at least one of
a content addressable memory (CAM) and local memory (LM) of a processor
and to modify at least one of the external memory accesses to access at least
one of the CAM and LM of the processor without having to perform the
respective external memory access.

identifying a candidate representing a plurality of instructions of a plurality of threads that perform one or more external memory accesses, the external memory accesses having a substantially identical base address; and inserting at least one of directives and instructions into an instruction stream corresponding to the identified candidate to maintain contents of at least one of a content addressable memory (CAM) and local memory (LM) of a processor and to modify at least one of the external memory access to access at least one of the CAM and LM of the processor without having to perform the respective external memory access.

- 15. (Cancelled)
- 16. (Cancelled)
- 17. (Currently Amended) The machine readable medium of claim 15, wherein the method further comprises screening out one or more ineligible candidates from the potential candidates, wherein the ineligible candidates include a base address that is different from a remainder of the potential candidates. A machine-readable storage medium having executable code to cause a machine to perform a method, the method comprising:

  identifying a candidate representing a plurality of instructions of a plurality of threads that perform one or more external memory accesses, the one or more external memory accesses having an identical base address, including partitioning the plurality of instructions of the external memory accesses into one or more sets of potential candidates based on dependency relationships of the instructions,

  converting addresses of each external memory accesses into a form having a

base address and an offset,

screening out one or more ineligible candidates from the potential candidates,
wherein the ineligible candidates include a base address that is different
from a remainder of the potential candidates, and
selecting one of the potential candidate sets as the candidate, instructions of the
candidate satisfying a predetermined dependency relationship,
inserting at least one of directives and instructions into an instruction stream
corresponding to the identified candidate to maintain contents of at least one of
a content addressable memory (CAM) and local memory (LM) of a processor
and to modify at least one of the external memory accesses to access at least
one of the CAM and LM of the processor without having to perform the
respective external memory accesss.

## 18. (Cancelled)

19. (Currently Amended) The machine readable medium of claim 15, wherein the method further comprises: A machine-readable storage medium having executable code to cause a machine to perform a method, the method comprising:

identifying a candidate representing a plurality of instructions of a plurality of threads
that perform one or more external memory accesses, the one or more external
memory accesses having an identical base address, including
partitioning the plurality of instructions of the external memory accesses into
one or more sets of potential candidates based on dependency
relationships of the instructions,

converting addresses of each external memory accesses into a form having a base address and an offset, and

selecting one of the potential candidate sets as the candidate, instructions of the candidate satisfying a predetermined dependency relationship; and inserting at least one of directives and instructions into an instruction stream corresponding to the identified candidate to maintain contents of at least one of a content addressable memory (CAM) and local memory (LM) of a processor and to modify at least one of the external memory accesses to access at least one of the CAM and LM of the processor without having to perform the respective external memory access, including for each thread, reserving a sufficient space in the local memory to store data

portions of cache lines, and

inserting a caching instruction prior to each of the external memory accesses. for each thread, reserving a sufficient space in the local memory to store data portions of cache lines; and

inserting a caching instruction prior to each of the external memory accesses.

- 20. (Currently Amended) The machine-readable storage medium of claim 19, wherein the method further comprises seeking the base address of each external memory access in the CAM to determine whether the CAM includes an entry that contains the base address being sought.
- 21. (Currently Amended) The machine-readable storage medium of claim 20 wherein if the CAM includes an entry containing the base address being sought, the method further comprises:

determining an offset of the local memory based on the entry of the CAM containing the base address being sought; and

accessing data from an entry of the local memory referenced by the determined offset.

- 22. (Currently Amended) The machine-readable <u>storage</u> medium of claim 20, wherein if the CAM does not includes an entry containing the base address being sought, the method further comprises allocating a least recently used (LRU) entry of the CAM having a base address of a previous external memory access.
- 23. (Currently Amended) The machine-readable <u>storage</u> medium of claim 22, wherein the method further comprises:

loading data of a current external memory access from the external memory into an entry of the local memory referenced by the allocated LRU entry; and storing the base address of the current external memory access in the LRU entry of the CAM replacing the base address of the previous external memory access.

- 24. (Currently Amended) The machine-readable <u>storage</u> medium of claim 22, wherein the method further comprises:
  - examining the base address of the previous external memory access in the allocated LRU entry to determine whether the base address is valid; and replicating data of an entry in the local memory corresponding to the allocated LRU entry to a location of the external memory based address of the previous external memory access.
- 25. 30. (Cancelled).
- 31. (New) A data processing system, comprising:
  - a processor; and
  - a memory for storing instructions, which when executed from the memory, cause the processor to perform operations, including
    - identifying a candidate representing a plurality of instructions of a plurality of threads that perform one or more external memory accesses, the one or

more external memory accesses having an identical base address, including

partitioning the plurality of instructions of the external memory
accesses into one or more sets of potential candidates based on
dependency relationships of the instructions,

converting addresses of each external memory accesses into a form having a base address and an offset,

grouping multiple potential candidates having substantially an identical base address into a single candidate, wherein a group having most of the potential candidates is selected as a final candidate for caching, and

selecting one of the potential candidate sets as the candidate,
instructions of the candidate satisfying a predetermined
dependency relationship; and

inserting at least one of directives and instructions into an instruction stream corresponding to the identified candidate to maintain contents of at least one of a content addressable memory (CAM) and local memory (LM) of a processor and to modify at least one of the external memory accesses to access at least one of the CAM and LM of the processor without having to perform the respective external memory access.

## 32. (New) A data processing system, comprising:

a processor; and

a memory for storing instructions, which when executed from the memory, cause the processor to perform operations, including

identifying a candidate representing a plurality of instructions of a plurality of threads that perform one or more external memory accesses, the one or

more external memory accesses having an identical base address, including

partitioning the plurality of instructions of the external memory
accesses into one or more sets of potential candidates based on
dependency relationships of the instructions,

converting addresses of each external memory accesses into a form having a base address and an offset,

screening out one or more ineligible candidates from the potential candidates, wherein the ineligible candidates include a base address that is different from a remainder of the potential candidates, and

selecting one of the potential candidate sets as the candidate, instructions of the candidate satisfying a predetermined dependency relationship,

inserting at least one of directives and instructions into an instruction stream corresponding to the identified candidate to maintain contents of at least one of a content addressable memory (CAM) and local memory (LM) of a processor and to modify at least one of the external memory accesses to access at least one of the CAM and LM of the processor without having to perform the respective external memory access.

## 33. (New) A data processing system, comprising:

a processor; and

a memory for storing instructions, which when executed from the memory, cause the processor to perform operations, including

identifying a candidate representing a plurality of instructions of a plurality of threads that perform one or more external memory accesses, the one or

more external memory accesses having an identical base address, including

partitioning the plurality of instructions of the external memory
accesses into one or more sets of potential candidates based on
dependency relationships of the instructions,

converting addresses of each external memory accesses into a form having a base address and an offset, and selecting one of the potential candidate sets as the candidate, instructions of the candidate satisfying a predetermined

inserting at least one of directives and instructions into an instruction stream corresponding to the identified candidate to maintain contents of at least one of a content addressable memory (CAM) and local memory (LM) of a processor and to modify at least one of the external memory accesses to access at least one of the CAM and LM of the processor without having to perform the respective external memory access, including

dependency relationship; and

for each thread, reserving a sufficient space in the local memory to store data portions of cache lines, and

inserting a caching instruction prior to each of the external memory accesses.